This article is part of Robert Sheldon's continuing series on Mongo DB. To see all of the items in the series, click here.
Similar to other NoSQL database systems, MongoDB is known for its flexible and variable schema models, unlike relational database systems in which well-defined schemas are essential to ensuring data integrity. In MongoDB, you can add documents to a collection that contain different fields or that include the same fields but with different data types or value ranges. You can even add documents that are completely unrelated. As long as you use proper Binary JSON (BSON) formatting, just about anything goes.
In some cases, however, this flexibility can get to be a little too much, and you might want to impose restrictions on a collection’s documents. In this way, you can better control how data is stored and presented so your applications experience the data in a consistent and reliable manner. For example, you might want to ensure that all documents added to a collection include the name
field and that the field always takes a string
value.
You can impose such restrictions on a collection’s documents by defining schema validation rules that specify the acceptable fields and their values. MongoDB’s validation capabilities are flexible and simple to implement and can be easily modified when needed. You can also create rules at a very granular level, even if it’s only a single field value. MongoDB applies the rules to new documents as they’re inserted into the collection and to existing documents when they’re updated. If a document violates those rules, MongoDB rejects the operation.
In this article, I demonstrate how to define validation rules on a collection. The article provides multiple examples of schema definitions that contain different types of validation rules. This is the first of two articles on schema validation. By the end of this article, you should have a good sense of how validation rules work and how to add them to your collections.
Note: For the examples in this article, I used the same MongoDB Atlas and MongoDB Compass environments I used for the previous articles in this series. Refer to the first article for more specifics about setting up these environments.
Introducing the JSON Schema object
You can use a MongoDB Shell command to define schema validation on a collection. You can also use the GUI features in MongoDB Compass, but you still need to understand how to build the validation rules themselves, and MongoDB Shell is a good place to start.
When adding schema validation to a collection, you must use the validator
method to create a JSON Schema object, which defines the validation rules. As part of this process, you must also use the $jsonSchema
operator to build the actual rules.
You can define the JSON Schema object when you first add your collection to the database or after the collection already exists. In either case, the format you use for invoking the validator
method is the same. The following syntax shows this format, which starts with calling the validator
method:
1 |
validator: { $jsonSchema: { <JSON Schema object> } } |
As the syntax shows, you pass the $jsonSchema
operator in as an argument to the validator
method. The $jsonSchema
operator, which is enclosed in curly brackets, defines the JSON Schema object, which is also enclosed in curly brackets.
The schema object itself is based on draft 4 of the JSON Schema standard. MongoDB omits several elements from the standard, while also extending it to support MongoDB’s BSON data types. A full explanation of the JSON Schema standard and its implementation in MongoDB is beyond the scope of this article, but you can find more information about how MongoDB implements the JSON Schema in the MongoDB topic $jsonSchema.
A good way to learn how to define validation rules is to see them in action. To this end, I’ve created a series of examples that demonstrate the core components that go into a collection’s schema validation object. The examples use the hr
database and candidates
collection, but you can use any database or collection, preferably one that’s empty and not deployed to a production environment.
If you want to try out these examples yourself, I recommend that you stick with the hr
database and candidates
collection to keep things simple. At this point in the series, you should have no trouble creating a database and collection. Refer to previous articles in this series if necessary.
When I created the examples, I used the version of MongoDB Shell that’s embedded in the MongoDB Compass GUI. I like this version of Shell because I can easily verify changes to my documents by viewing them in the Compass GUI. That said, if you have MongoDB Shell installed on your system, you can instead use your system’s command-line interface (CLI) to try out these examples. The results are the same in either case.
For this article, I created the examples based on an existing collection (candidates
), rather than defining them when creating the collection. This approach makes it easier to run through the examples without needing to drop the collection before re-creating it. It also makes it easier to reuse the statement as you refine the rules.
Adding validation rules to a MongoDB collection
To add validation rules to the candidates
collection, we’ll start by using the runCommand
database method to call the collMod
database command. The command let’s us add options to a collection, which in this case, are the validation rules. We’ll use the command to call the validator
method and, subsequently, the $jsonSchema
operator, which defines the JSON Schema object. The following example demonstrates how all this works:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 |
db.runCommand( { collMod: "candidates", validator: { $jsonSchema: { bsonType: "object", title: "candidates validation", required: [ "name", "dob", "position" ], properties: { "name": { bsonType: "string", description: "Field required and must be a string." }, "dob": { bsonType: "date", description: "Field required and must be a date." }, "position": { bsonType: "object", required: [ "title", "dept" ], properties: { "title": { bsonType: "string", description: "Field required and must be a string." }, "dept": { bsonType: "string", enum: [ "R&D", "Marketing", "IT", "HR", "Finance" ], description: "Field required and must be one of the specified values." }, "skills": { bsonType: "array", description: "Field must be an array, if included." }, "yrs_exp": { bsonType: "int", minimum: 3, description: "Field must be an int greater than 4, if included." } } } } } } }); |
The entire statement is passed into MongoDB Shell as a single command. As noted earlier, the command adds the validation rules to an existing collection (candidates
). If you want to add the rules when creating the collection, you must include the validator
object as an argument to the createCollection
method. The MongoDB topic Specify JSON Schema Validation shows an example of how this is done.
Returning to the example above, the first three lines of the command are fairly standard when defining validation rules:
- Invoke the
runCommand
method on the database object associated with thehr
database. The method runs thecollMod
database command. The command’s first argument is thecandidates
collection. - Specify the
validator
method as the second argument to thecollMod
command. - Specify the
$jsonSchema
operator as an argument to thevalidator
method.
The remaining code, enclosed in curly brackets, defines the JSON Schema object that is returned by the $jsonSchema
operator. A JSON Schema object is essentially a JSON document. Each line is a schema element that contains a keyword, followed by a value, much like a JSON document in which each field is followed by the field value. The elements are organized hierarchically. In this case, the following four elements at the top of the hierarchy:
bsonType.
Indicates the data type of that particular element. WhenbsonType
is included as a top-level element, as it is here, the data type isobject
and refers to the JSON Schema object as a whole. This element is often omitted from the hierarchy’s top level.title
. Provides a name for the set of validation rules. This element is often omitted from the schema definition.required.
Indicates which fields are required in the collection’s documents. The element’s value is an array of string values that list the field names. Any fields within the array must be included in the document. The element is omitted from the schema definition if no fields are required.properties.
Defines specific properties associated with the listed fields, which are included as subelements within theproperties
element, much like an embedded document. Each document field must adhere to the schema defined for that subelement. Theproperties
element is omitted from the schema definition if no field properties need to be defined.
In this case, the top-level properties
element includes the three field subelements: name
, dob
, and position
. The subelements for the name
and dob
fields specify the data type and provide a description for each field. When a data type is specified, the field’s value must conform to that type. The description is used when returning an error message relevant to that subelement.
The third subelement applies to the position
field, which is an embedded document. The position
subelement includes its own subelements: bsonType
, required
, and properties
.
In this case, too, the properties
subelement is broken down further into its own field subelements: title
, dept
, skills
, and yrs_exp
. All four of these embedded subelements define the data type and provide a description. There are also a couple new element types.
- The
dept
subelement includes theenum
element, which defines the values that theposition.dept
field can include. The values are defined as an array of stings. No other values can be inserted into this field. - The
yrs_exp
subelement includes theminimum
element, with its value set to3
. As a result, the value inserted into theposition.yrs_exp
field must be3
or greater.
Those are all the elements that make up this particular JSON Schema object. You can include fewer details in your schema definition, or you can include more. You can also include element types not shown here. The MongoDB topic $jsonSchema includes a list of available element types—or keywords—that MongoDB supports for schema validation.
Verifying a collection’s schema validation rules
After you define validation rules on the candidates
collection, you’ll likely want to verify that the rules are working as expected. A good way to do this is to try to insert a document into the collection. For example, the following insertOne
command tries to add a document that includes all the fields referencd in the validation rules:
1 2 3 4 5 6 7 8 9 |
db.candidates.insertOne({ "name": "Drew", "dob": "1973-9-12", "position": { "title": "Senior Developer", "dept": "R&D", "skills": [ "Java", "R", "Python", "PHP" ], "yrs_exp": 18 } }); |
On the surface, it might appear that you should be able to add this document with no problem. However, the validation rules specify that the dob
field can take only a date
value, while the command tries to add it as a string
value. As a result, MongoDB returns the following error, which indicates that there is a type mismatch:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 |
MongoServerError: Document failed validation Additional information: { failingDocumentId: { buffer: <Buffer 66 84 1d fd 9f 07 2e ed ef c2 87 34> }, details: { operatorName: '$jsonSchema', title: 'candidates validation', schemaRulesNotSatisfied: [ { operatorName: 'properties', propertiesNotSatisfied: [ { propertyName: 'dob', description: 'Field required and must be a date.', details: [ { operatorName: 'bsonType', specifiedAs: { bsonType: 'date' }, reason: 'type did not match', consideredValue: '1973-9-12', consideredType: 'string' } ] } ] } ] } } |
Notice that the error message includes the description
element that was defined on the dob
field. The message also indicates that the type did not match what was expected. To correct this issue, you need to pass in the dob
value as a date
type, as shown in the next example:
1 2 3 4 5 6 7 8 9 |
db.candidates.insertOne({ "name": "Drew", "dob": new Date("1973-9-12"), "position": { "title": "Senior Developer", "dept": "R&D", "skills": [ "Java", "SQL", "Python", "PHP" ], "yrs_exp": 18 } }); |
You should now be able to insert the document with no problem. MongoDB will then return a confirmation message similar to the following, although with a different insertedId
value:
1 2 3 4 |
{ acknowledged: true, insertedId: ObjectId('6683317721ea5e563566c9ec') } |
You can also use an updateOne
command to confirm that your validation rules are working as expected. For example, the following updateOne
command tries to update the Drew
document by setting the position.dept
value to Dev
:
1 2 3 4 |
db.candidates.updateOne( { "name" : "Drew" }, { $set: { "position.dept": "Dev" } } ); |
As you’ll recall from the collection’s validation rules, the position.dept
value must be one of those specified in the enum
array. Dev
is not in that array, so the document cannot be updated in this way. If you try to run this command, you’ll receive an error message indicating that Dev
was not found in enum
.
Another way you can test your validation rules is to try to update a document by changing the position.yrs_exp
value to 2
, as in the following example:
1 2 3 4 |
db.candidates.updateOne( { "name" : "Drew" }, { $set: { "position.yrs_exp": 2 } } ); |
In this case, the validation rules state that the position.yrs_exp
value must be at least 3
, so MongoDB will again prevent you from updating the document. Instead, you’ll receive an error message indicating that your comparison failed.
Controlling permitted fields in a collection’s document
By default, schema validation is concerned only with the fields that are specified within the validation rules. In other words, there is nothing to prevent you from adding other fields to your documents. For example, you might add a document to the candidates
collection that includes a field describing the candidate’s personal interests or hobbies.
In some cases, however, you might want to ensure that the only fields included in a document are those defined within the properties
element. You can do this by adding the additionalProperties
element to your schema definition and setting its value to false
, as in the following example:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 |
db.runCommand( { collMod: "candidates", validator: { $jsonSchema: { bsonType: "object", title: "candidates validation", required: [ "name", "dob", "position" ], additionalProperties: false, properties: { "name": { bsonType: "string", description: "Field required and must be a string." }, "dob": { bsonType: "date", description: "Field required and must be a date." }, "position": { bsonType: "object", required: [ "title", "dept" ], properties: { "title": { bsonType: "string", description: "Field required and must be a string." }, "dept": { bsonType: "string", enum: [ "R&D", "Marketing", "IT", "HR", "Finance" ], description: "Field required and must be a string." }, "skills": { bsonType: "array", description: "Field must be an array if included." }, "yrs_exp": { bsonType: "int", minimum: 3, description: "Field must be an int if included." } } } } } } }); |
This command is the same as the previous validation example, except that it now includes the additionalProperties
element. If you run this command in MongoDB Shell, it will automatically update your validation rules on the candidates
collection. You don’t need to take any other steps to update the rules.
Although adding the additionalProperties
element is fairly straightforward, you must be careful when doing so. For example, you might expect to be able to run the following insertOne
command, which tries to add another document to the candidates
collection:
1 2 3 4 5 6 7 8 |
db.candidates.insertOne({ "name": "Parker", "dob": new Date("1982-12-2"), "position": { "title": "Data Scientist", "dept": "R&D", "yrs_exp": 14 } }); |
The document does not include any fields that are not defined in the properties
element, nor does it appear to violate any rules defined on the individual fields. However, if you try to run this command, you’ll receive an error stating that your document contains a property (field) not defined in the properties
element. The error, as it turns out, is related to the _id
field.
Each document in a MongoDB collection must include the _id
field. If you don’t include the field in your document definition, MongoDB will automatically add it. However, the validation rules, as they’re currently defined, do not specify this field, so any document you try to add the collection will fail validation.
To address this issue, you must specify the _id
field within the properties
element, along with the other fields, as shown in the following example:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 |
db.runCommand( { collMod: "candidates", validator: { $jsonSchema: { bsonType: "object", title: "candidates validation", required: [ "_id", "name", "dob", "position" ], additionalProperties: false, properties: { "_id": { bsonType: "objectId", description: "Field required and must be an objectID." }, "name": { bsonType: "string", description: "Field required and must be a string." }, "dob": { bsonType: "date", description: "Field required and must be a date." }, "position": { bsonType: "object", required: [ "title", "dept" ], properties: { "title": { bsonType: "string", description: "Field required and must be a string." }, "dept": { bsonType: "string", enum: [ "R&D", "Marketing", "IT", "HR", "Finance" ], description: "Field required and must be a string." }, "skills": { bsonType: "array", description: "Field must be an array if included." }, "yrs_exp": { bsonType: "int", minimum: 3, description: "Field must be an int if included." } } } } } } }); |
As you can see, the top-level properties
element now includes a subelement for the _id
field, with its data type defined as objectId
. I also included _id
field in the required
element just for completeness.
After you update the schema, you can verify your changes by rerunning the previous insertOne
command:
1 2 3 4 5 6 7 8 |
db.candidates.insertOne({ "name": "Parker", "dob": new Date("1982-12-2"), "position": { "title": "Data Scientist", "dept": "R&D", "yrs_exp": 14 } }); |
You should now be able to insert the document with no problem. However, this only shows that that you can add a document that is in the expected format. Another test you can perform is to try to add a document that contains a field not defined in the properties
element, as in the following example:
1 2 3 4 5 6 7 8 9 |
db.candidates.insertOne({ "name": "Harper", "dob": new Date("1967-3-25"), "region": "southeast", "position": { "title": "Marketing Manager", "dept": "Marketing", "yrs_exp": 22 } }); |
The command attempts to add a document that includes the region
field. If you try to run this command, you will receive an error message stating that region
is an additional property. To address this issue, you can simply remove the offending field, as in the following insertOne
statement:
1 2 3 4 5 6 7 8 |
db.candidates.insertOne({ "name": "Harper", "dob": new Date("1967-3-25"), "position": { "title": "Marketing Manager", "dept": "Marketing", "yrs_exp": 22 } }); |
When you run this version of the command, you should be able to insert the document with no problem.
Working with null values when validating schema
In some cases, you might want to be able to insert a null
for a field value when adding or updating a document. For example, an application might automatically default to null
if a value is not known, rather than excluding the field from the document. If null
values are going to be used, you must take them into account when configuring validation rules.
For example, suppose the position.title
field might take a null
value in some cases, as in the following updateOne
command:
1 2 3 4 |
db.candidates.updateOne( { "name" : "Harper" }, { $set: { "position.title": null } } ); |
If you try to run this statement, MongoDB will generate an error message stating that the type does not match. This is because the validation rules currently state that the position.title
field permits string
values only. However, you can easily fix this issue by updating the position.title
subelement in the schema definition, as in the following example:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 |
db.runCommand( { collMod: "candidates", validator: { $jsonSchema: { bsonType: "object", title: "candidates validation", required: [ "_id", "name", "dob", "position" ], additionalProperties: false, properties: { "_id": { bsonType: "objectId", description: "Field required and must be an objectID." }, "name": { bsonType: "string", description: "Field required and must be a string." }, "dob": { bsonType: "date", description: "Field required and must be a date." }, "position": { bsonType: "object", required: [ "title", "dept" ], properties: { "title": { bsonType: [ "string", "null" ], description: "Field required and must be a string." }, "dept": { bsonType: "string", enum: [ "R&D", "Marketing", "IT", "HR", "Finance" ], description: "Field required and must be a string." }, "skills": { bsonType: "array", description: "Field must be an array if included." }, "yrs_exp": { bsonType: "int", minimum: 3, description: "Field must be an int if included." } } } } } } }); |
Notice that I’ve updated the bsonType
value for the position.title
subelement. The value is now an array that includes both string
and null
as acceptable types. After you update the schema definition, you can then try to rerun the updateOne
command. MongoDB should now update the document and return the following message:
1 2 3 4 5 6 7 |
{ acknowledged: true, insertedId: null, matchedCount: 1, modifiedCount: 1, upsertedCount: 0 } |
That’s all there is to handling null
values when defining validation rules. You can do this for any field in which null
might be an acceptable value. That said, you need to do this only for those fields included in the properties
element.
Adding query operators to your schema validation rules
In some cases, you might want to include additional logic to your schema definition to better control field values when inserting or updating documents. For this, you can use query operators to build expressions that apply different types of logic to your documents.
For example, suppose you want to ensure that all job candidates are at least 21 years old. You can update your schema definition to include this logic, along with the original JSON Schema object, as shown in the following example:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 |
db.runCommand( { collMod: "candidates", validator: { "$and": [ { "$expr": { "$lte": [ "$dob", { $dateSubtract: { startDate: new Date(), unit: "year", amount: 21 } } ] } }, { $jsonSchema: { bsonType: "object", title: "candidates validation", required: [ "_id", "name", "dob", "position" ], additionalProperties: false, properties: { "_id": { bsonType: "objectId", description: "Field required and must be an objectID." }, "name": { bsonType: "string", description: "Field required and must be a string." }, "dob": { bsonType: "date", description: "Field required and must be a date." }, "position": { bsonType: "object", required: [ "title", "dept" ], properties: { "title": { bsonType: [ "string", "null" ], description: "Field required and must be a string." }, "dept": { bsonType: "string", enum: [ "R&D", "Marketing", "IT", "HR", "Finance" ], description: "Field required and must be a string." }, "skills": { bsonType: "array", description: "Field must be an array if included." }, "yrs_exp": { bsonType: "int", minimum: 3, description: "Field must be an int if included." } } } } } } ] } }); |
The first thing to notice is that the first validator
argument now begins with the $and
logical operator, which is followed by two conditions: an expression that calculates the age based on the dob
field and the JSON Schema object definition, which is the same one from the previous validation example. Both of these conditions must be met to be able to insert or update a document.
The JSON Schema object should need no further explanation because nothing has changed, so let’s take a closer look at the expression, which begins with the $expr
evaluation operator. The operator lets us define an aggregation expression within our query. The aggregation expression is everything enclosed in the curly brackets that follow the $expr
keyword.
That expression uses the $lte
logical operator to indicate that the dob
value (represented as $dob
) must be less than or equal to the value returned by the $dateSubtract
operator. The $dateSubtract
operator subtracts 21
years from the current date to arrive at a date
value that is then compared to the dob
value. If the dob
value occurs before the returned date
value, the document can be added.
After you update the schema definition, you can test out your changes by running the following statement.
1 2 3 4 5 6 7 8 9 |
db.candidates.insertOne({ "name": "Darcy", "dob": new Date("2004-7-2"), "position": { "title": "Developer", "dept": "R&D", "skills": [ "Java", "Csharp", "Python", "R" ], "yrs_exp": 3 } }); |
The statement uses 2004-7-2
as the dob
value, which should cause the statement to fail. If by the time you read this article, the date is no longer less than 21 years from the current date, you’ll need to adjust the value. The idea is to get the command to generate an error as a result of trying to add a dob
value that is below the required age. When the command fails, you should get an error message stating that the expression did not match.
Next, run the following insertOne
command, which is the same as the previous one, except that the dob
value is now definitely more then 21 years ago:
1 2 3 4 5 6 7 8 9 |
db.candidates.insertOne({ "name": "Darcy", "dob": new Date("2003-7-2"), "position": { "title": "Developer", "dept": "R&D", "skills": [ "Java", "Csharp", "Python", "R" ], "yrs_exp": 3 } }); |
The command should run with no problem because it does not violate any of the validation rules.
There are, of course, plenty of other ways you can define your expressions so you can include different logic in your validator
object. You can, in fact, define an expression without including a JSON Schema object. For more information about using query operators, see the MongoDB topic Specify Validation With Query Operators.
Getting started with MongoDB validation rules
For the examples in this article, I used MongoDB’s default validation settings. In the next article, I plan to continue the discussion on validation rules, at which time, I’ll provide more details on how you can override the default behavior. I’ll also discuss validation rules as they apply to documents that already exist within a collection. In the meantime, I suggest you review the MongoDB topic Schema Validation, which introduces you to validation rules. I think you’ll find that schema validation is a valuable tool for working with MongoDB data, although you’ll likely want to limit its use to more mature applications when the schema is relatively stable.
Load comments